Video Covariance Matrix Logarithm for Human Action Recognition in Videos
نویسندگان
چکیده
In this paper, we propose a new local spatiotemporal descriptor for videos and we propose a new approach for action recognition in videos based on the introduced descriptor. The new descriptor is called the Video Covariance Matrix Logarithm (VCML). The VCML descriptor is based on a covariance matrix representation, and it models relationships between different low-level features, such as intensity and gradient. We apply the VCML descriptor to encode appearance information of local spatio-temporal video volumes, which are extracted by the Dense Trajectories. Then, we present an extensive evaluation of the proposed VCML descriptor with the Fisher vector encoding and the Support Vector Machines on four challenging action recognition datasets. We show that the VCML descriptor achieves better results than the state-of-the-art appearance descriptors. Moreover, we present that the VCML descriptor carries complementary information to the HOG descriptor and their fusion gives a significant improvement in action recognition accuracy. Finally, we show that the VCML descriptor improves action recognition accuracy in comparison to the state-of-the-art Dense Trajectories, and that the proposed approach achieves superior performance to the state-of-theart methods.
منابع مشابه
Action Recognition Based on Spatio-temporal Log-Euclidean Covariance Matrix
In this paper, we handle the problem of human action recognition by combining covariance matrices as local spatio-temporal (ST) descriptors and local ST features extracted densely from action video. Unlike traditional methods that separately utilizing gradient-based feature and optical flow-based feature, we use covariance matrix to fuse the two types of feature. Since covariance matrices are S...
متن کاملAction Change Detection in Video Based on HOG
Background and Objectives: Action recognition, as the processes of labeling an unknown action of a query video, is a challenging problem, due to the event complexity, variations in imaging conditions, and intra- and inter-individual action-variability. A number of solutions proposed to solve action recognition problem. Many of these frameworks suppose that each video sequence includes only one ...
متن کاملAction Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels
A novel framework for action recognition in video using empirical covariance matrices of bags of low-dimensional feature vectors is developed. The feature vectors are extracted from segments of silhouette tunnels of moving objects and coarsely capture their shapes. The matrix logarithm is used to map the segment covariance matrices, which live in a nonlinear Riemannian manifold, to the vector s...
متن کاملHand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study
Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...
متن کاملAction Recognition Using Log-covariance Matrices of Silhouette and Optical-flow Features
Algorithms for recognizing human actions in a video sequence are needed in applications such as video surveillance and video search and retrieval. Developing algorithms that are not only accurate but also efficient is challenging due to the complexity of the task and the sheer size of video. In this thesis, we develop a general framework for compactly representing, quickly comparing, and accura...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015